Gromov-Hausdorff stability of linkage-based hierarchical clustering methods
نویسنده
چکیده
A hierarchical clustering method is stable if small perturbations on the data set produce small perturbations in the result. This perturbations are measured using the Gromov-Hausdorff metric. We study the problem of stability on linkage-based hierarchical clustering methods. We obtain that, under some basic conditions, standard linkage-based methods are semi-stable. This means that they are stable if the input data is close enough to an ultrametric space. We prove that, apart from exotic examples, introducing any unchaining condition in the algorithm always produces unstable methods.
منابع مشابه
Characterization, Stability and Convergence of Hierarchical Clustering Methods
We study hierarchical clustering schemes under an axiomatic view. We show that within this framework, one can prove a theorem analogous to one of Kleinberg (2002), in which one obtains an existence and uniqueness theorem instead of a non-existence result. We explore further properties of this unique scheme: stability and convergence are established. We represent dendrograms as ultrametric space...
متن کاملComputing the Shape of Brain Networks Using Graph Filtration and Gromov-Hausdorff Metric
The difference between networks has been often assessed by the difference of global topological measures such as the clustering coefficient, degree distribution and modularity. In this paper, we introduce a new framework for measuring the network difference using the Gromov-Hausdorff (GH) distance, which is often used in shape analysis. In order to apply the GH distance, we define the shape of ...
متن کاملTemporal Hierarchical Clustering
We study hierarchical clusterings of metric spaces that change over time. This is a natural geometric primitive for the analysis of dynamic data sets. Specifically, we introduce and study the problem of finding a temporally coherent sequence of hierarchical clusterings from a sequence of unlabeled point sets. We encode the clustering objective by embedding each point set into an ultrametric spa...
متن کاملChoosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation
1- INTRODUCTION The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to ...
متن کاملImproved Error Bounds for Tree Representations of Metric Spaces
Estimating optimal phylogenetic trees or hierarchical clustering trees from metric data is an important problem in evolutionary biology and data analysis. Intuitively, the goodness-of-fit of a metric space to a tree depends on its inherent treeness, as well as other metric properties such as intrinsic dimension. Existing algorithms for embedding metric spaces into tree metrics provide distortio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1311.5068 شماره
صفحات -
تاریخ انتشار 2013